L a Probabilistic Model for Text Categorization: Based on a Single Random Variable with Multiple Values 1
نویسندگان
چکیده
Text categorization is the classication of documents with respect to a set of predened categories. In this paper, we propose a new probabilistic model for text categorization, that is based on a Single random Variable with Multiple Values (SVMV). Compared to previous probabilistic models, our model has the following advantages; 1) it considers within-document term frequencies, 2) considers term weighting for target documents, and 3) is less aected by having insucient training cases. We verify our model's superiority over the others in the task of categorizing news articles from the \Wall Street Journal".
منابع مشابه
A Probabilistic Model for Text Categorization: Based on a Single Random Variable with Multiple Values
Text categorization is the classification of documents with respect to a set of predefined categories. In this paper, we propose a new probabilistic model for text categorization, that is based on a Single random Variable with Multiple Values (SVMV). Compared to previous probabilistic models, our model has the following advantages; 1) it considers within-document term frequencies, 2) considers ...
متن کاملSupport vector regression with random output variable and probabilistic constraints
Support Vector Regression (SVR) solves regression problems based on the concept of Support Vector Machine (SVM). In this paper, a new model of SVR with probabilistic constraints is proposed that any of output data and bias are considered the random variables with uniform probability functions. Using the new proposed method, the optimal hyperplane regression can be obtained by solving a quadrati...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملImproving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA
With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...
متن کاملA discrete particle swarm optimization algorithm with local search for a production-based two-echelon single-vendor multiple-buyer supply chain
This paper formulates a two-echelon single-producer multi-buyer supply chain model, while a single product is produced and transported to the buyers by the producer. The producer and the buyers apply vendor-managed inventory mode of operation. It is assumed that the producer applies economic production quantity policy, which implies a constant production rate at the producer. The operational pa...
متن کامل